perm filename FOO[NSF,MUS]1 blob sn#096541 filedate 1974-04-10 generic text, type T, neo UTF8
.SELECT A
2. SIMULATION OF LOCALIZED SOUND SOURCES
.GROUP SKIP 2
.SELECT C
CURRENT RESEARCH
.SELECT 1
.BEGIN FILL ADJUST
The simulation of reverberant spaces, as described above, has led to
consequent research into the localization cues for a simulated source
within the space.  The control of  the reverberant signals  emanating
from two or more loudspeakers allows  the simulation of a space which
is largely independent of the actual reflecting surfaces of the space
in which the  loudspeakers are located.   The natural consequence  of
this illusion, then,  is the arbitrary placement of  a signal at some
point in the illusory space which  may, in fact, be beyond the  walls
of the physical space.

Our  interest is  in  the  further  development of  techniques  which
require  the minimum  number of  loudspeaker-channels to  provide the
perceptual cues  for azimuth,  distance, and  altitude  in a  natural
environment. There have been a  number of studies into the perception
of  location of a sound  source (see Mills, 1972, for a comprehensive
review of the psychoacoustic literature on auditory localization).  
Most findings are
based on unnatural listening conditions where constraints are placed
on  the  environment or  the  listener or  both:  a  large number  of
loudspeakers, anechoic chamber, headphones, or fixed position of  the
listener. The conditions  we assume in  our research are  few speaker
channels, a room  having a moderate reverberation time, a space which
will accomodate a number of listeners, and no headphones.

We first  describe the simulation  of the cues  for azimuth,
distance, and velocity as currently implemented in our system.
These cues are demonstrated on sound Example 5.
A more complete
description of these techniques is presented in Appendix F.
.END
.GROUP SKIP 2
.SELECT 5
simulation of azimuth and distance cues
.SELECT 1
.BEGIN FILL ADJUST
The ability of  a listener to  localize a  sound source is  dependent
upon  cues  for  three  dimensions:  azimuth  or  horizontal  angular
displacement,    distance,   and   altitude   or   vertical   angular
displacement.  Of  the three the  last is the  least critical in  our
ordinary (horizontal) environment and is the least discriminated.

The cues for angular localization are: 1) the different arrival times
of  the signal to  the two  ears when the  source is not  centered in
front of or behind the listener,  2) the pressure-level difference of
short wavelength signals  at the two ears resulting  from the shadow
effect  of  the head  when  the signal  is  not centered,    3) cues,
currently not well-understood,  provided by the asymmetricality of the
pinna for very high frequency energy.

The cues  for the  distance of  a source  from a  listener, when  the
distance is greater than  a few feet, are: 1) the ratio of the direct
signal energy  to  the reverberant  signal  energy where  the  direct
signal decreases in  intensity inversely according to  the square of
the  distance, 2) the loss  of high frequency components
with increasing distance between the signal source and the listener, and
3) the loss  of detail in the signal with increasing distance. Low intensity
componants are lost.
.END
.BEGIN FILL ADJUST
At the present, our simulation system consists of four loudspeakers which
are independently driven by the digital to analog converter output of the computer.
For the purpose of localization simulation, the
speaker-channels are arranged
in a square and the listener is assumed to be at the center, equi-distant
from the four loudspeakers as shown in Figure 1, app. F.  Adjacent speaker
pairs form an angle of 90 degrees relative to the listener.

In order to simulate the azimuthal cue, we distribute the sound to
one adjacent pair of speakers at a time. When simulating a source directly
behind one speaker, all of the signal comes out of that speaker. As the
source moves from one speaker to another, we decrease the amount of signal
in the speaker the source is leaving and increase the amount of signal
in the speaker the source is approaching.
The particular functions we use to distribute the sound are
%4Q%1/%4Q%8max%1 for one speaker and (1-%4Q%1/%4Q%8max%1) for the other where
%4Q%1 is the angle of displacement and %4Q%8max%1 is equal to 90 degrees.
Because the computations are based on the ideal position of the listener,
there is a positional distortion of the phantom image
for any other listener.  This distortion is constant for any position and
has not been found to be objectionable unless a listener is very close to
one of the loudspeakers.

The simulation of the distance cue is dependent upon the availability of
artificial reverberation.  For the simplest case, the reverberation is set
to be constant in intensity and the direct signal is scaled to be inversely
proportional to the distance in question.  For simplicity, he unit distance, 1,
is assigned to be the point which divides the line joining two adjacent
speakers and is, therefore, the nearest point which can be simulated.
With increasing distance, the reverberant signal remains constant while the direct
signal decreases in intensity, thereby meeting the principle requirement for
the distance cue: a change in ratio of direct to reverberant energy.

The first extension of this technique involves a scaling for the reverberant
signal with distance as well. The reverberant signal is
attenuated according to a function which decreases less rapidly with
distance, for example, in inverse proportion to the square root of the distance.
As the source moves away, the total sound from the source, including
reverberation, will decrease.

There is another important detail in our current technique for the simulation of
the distance cue.  At distances beyond the echo radius (that distance where
the intensities of the direct and reverberant signals are equal) the direct
signal becomes masked by the reverberant signal, thereby eliminating the
azimuthal information.  In order to overcome this deficiency we divide the
reverberant signal into two parts: 1) Global, which is distributed equally
to all channels, but which is now attenuated with distance of the direct signal
according to 1/distance; 2) Local, which is distributed between speaker pairs
with the direct signal and is increased with distance of the direct signal
according to 1-(1/distance).  Thus, when the source is
close to the listener, the reverberant signal is equally distributed in
all channels and as the source moves away, 
the reverberant signal becomes concentrated in the direction of the
source.

Note that this localization of reverberation is in addition to the
scaling for distance, which is inversely proportional to the square root
of the distance. The reverberation, thus, has two attenuation factors.
One factor which is equal in all speakers, and one factor which favors
one pair of speakers.

The direct signal itself has also two attenuation factors. One which attenuates the
signal with distance, and one which distributes the direct signal to two
adjacent speakers.
.END
.GROUP SKIP 2
.SELECT 5
moving sources and velocity cue
.SELECT 1
.BEGIN FILL ADJUST
The localization technique has been extended to include the simulation
of a moving sound source.  This capability is a very powerful potential
tool for tuning the simulation algorithms for localization cues.

A special program has been written which allows the user to specify an
arbitrary path in a two-dimensional space by means of a light pen
or a computed geometry.  The program evaluates the trajectory and then derives
the time functions which control distance and angle for the simulation.
There is an additional component which is present in the case of
a moving source and which is derived by the program: Doppler shift of
the frequency as a result of the radial velocity of the source relative
to the listener.  This frequency shift has been found to be an essential
cue in the simulation of moving sources.  The program then applies
these computed functions to the synthesized signal which can result in convincing
illusions of spatial movement for the listener.
.END
.GROUP SKIP 2
.SELECT C
PROPOSED RESEARCH
.SELECT 1
.BEGIN FILL ADJUST
The programs and techniques briefly described above and more fully in
Appendix F, have proven to be sufficiently powerful to convince us
that they include all of the significant cues for localization and that
the method of control is valid.  There is, nevertheless, a great amount
of research to be done relative to the internal algorithms for the
individual cues and the optimum number and relative positions of the 
speaker-channels.  In their current state, they represent the obvious
approximations to the natural physical processes.  It may very well be,
however, that special distortions of the space and/or amplitude relationships
may provide a significant enhancement of the localization cues when projected
through a few loudspeakers.  In this section we propose further research
by examining each of the cues independently.
.END
.GROUP SKIP 2
.SELECT 5
azimuth
.SELECT 1
.BEGIN FILL ADJUST
We have found the cue for azimuth to be the most problematical in simulations
using four loudspeakers.  The energy is distributed between speakers to
provide a phantom source for the listener at the center of the space
circumscribed by the loudspeakers.  The centrally positioned listener can
perceive a spatial distribution of 360 degrees.
For positions of increasing distance from the center,
the closest loudspeaker increasingly dominates
the phantom source, until the %5worst case%1, a position
next to a loudspeaker, where the listener perceives a spatial
distribution of only 90 degrees.  The perceived space, then, decreases monotonically
as the listener position is further from the center, but the exact nature of the
function is as yet unknown.

With five loudspeakers arranged in a circle, the %5worst case%1 position allows
a perceived space of 108 degrees, an increase of 20 per cent, six speakers 120 degrees,
eight speakers 135 degrees, etc.  There are clearly diminishing returns with
additional speaker-channels.

It is in our planned research to thoroughly investigate the area of the
listener space which is circumscribed by the maximum acceptable
spatial distortion for off-center positions within that space and how that useable
area increases with the addition of speaker-channels.  Of particular importance
is the cost-effectivness for n speaker-channels.

The problem of optimum number and placement of speaker-channels relates also
to the simulated reverberant space, described in section IIB1, above.
The perceptual criteria for realistic reverberation and azimuth cues are
different, however.  For reverberation, the perceptual impression of
diffusion or unlocalized is desired, whereas, for the the azimuth cue of
a source, the utmost localization is desired.  It is probable that the number
of speaker-channels required for diffuse reverberation is less than the
number required for azimuthal localization.  We will consider, therefore,
optimizations in our simulation algorithms to maintain the minimum number
of uncorrelated reverberant speaker-channels independent of the number
required for the most effective simulation of azimuth.

In the simulation algorithm we distribute the energy between speaker pairs
in proportion to the angle of displacement.  It may be that other functions
can be used to better "fill the hole" between speakers, e.g. the tangent of
the angle of displacement.  We plan to apply, here, the multi-dimensional
scaling techniques to subjective evaluations of various modifications
to the algorithm.
.END
.GROUP SKIP 2
.SELECT 5
distance
.SELECT 1
.BEGIN FILL ADJUST
The validity of the distance cue in our simulations is the most convincing
because it is independent of both the number of speaker-channels and the 
position of the listener within the space.  Our algorithm, as stated previously,
is based upon the attenuation of the intensity of the signal in inverse proportion
to the square of the distance.  This relation for the direct signal seems
to be inviolate and effective.

The attenuation of the reverberant signal
with increasing distance of the direct signal does not suggest
an equivalent absolute relationship as does the direct signal.  In a small
space, the overall intensity of the reverberant signal changes little,
whereas in a large
space the change may be significant.  In order to evaluate the perceptual
significance of this changing reverberation, we propose to make four-channel
recordings of signals at a variety of distances in a variety of spaces
and analyze these signals by means of computer analysis techniques.
Evaluation of a number of cases should provide us insight into the amount
and perceptual effect of the change in the reverberant signal as a
source moves away from the listener.  It could well be that the change
in the amount is not as great as the change in the directional emphasis,
since, in moving away from the listener, the source is moving toward
some reflecting surfaces and away from others.

Our experience has shown that the distance cue becomes obscure when the
total amount of reverberant energy is great and the simulated source is
not in immediate proximity to the listener.  In addition, it is obvious
that with no artificial reverberation, any attenuation for distance imposed
on a source signal will be perceived only as a difference in loudness
and not distance.  Since the only reverberation is that which is
natural to the room, it will change in intensity in constant
proportion to the change in intensity of the source signal.  We propose
to determine the maximum and minimum ratios of reverberant to direct
signals which will fully preserve the distance cue.
.END
.GROUP SKIP 2
.SELECT 5
location of source as indication of room size
.SELECT 1
.BEGIN FILL ADJUST
In addition to the room information described in section IIB1 above,
the localization cues of a simulated source can also provide information.
The cue for distance, for example, can be either in contradiction to,
or confirmation of, room information carried in the reverberation itself.
As an example of contradictory cues, very short first delays could be projected
from the listener's left, indicating a near reflecting surface,
while from the same direction, the direct signal cue for distance indicates far.
This implicit power of digital synthesis to precisely and independently
control cues, we see as being of enormous usefulness in establishing their potency
and relative dominance.
.END
.GROUP SKIP 2
.SELECT 5
altitude
.SELECT 1
.BEGIN FILL ADJUST
It is apparent that the most powerful realization of simulated localization
and reverberant spaces must also include cues for altitude.  Localization studies
on the vertical plane show that the processing of the pinnae of the ear of
frequencies greater than 7000 Hz provides the critical cue.
In addition, there are large spaces, most notably
cathedrals, where the vertical reverberant component makes a clear contribution
to the subjective impression of the space.  We propose to investigate, therefore,
arrangements of speaker-channels which most effectively and efficiently
provide such an impression.  An obvious arrangement of four speaker-channels
would be in a tetrahedron.  For an ideally located listener, four may suffice,
however, in order to accomodate a larger listener space, a larger number is
most probably required.
.END
.GROUP SKIP 2
.SELECT 5
perceptual scaling and testing
.SELECT 1
.BEGIN FILL ADJUST
We are planning to use psychological scaling techniques to
determine the  relative weightings  of  the various  cues for
localization and the perception of  the motion of sound sources.
Of particular importance are:
the optimal amount of  Doppler shift needed to create
specific images in particular contexts;
the perceptual  scale which describes  the relationship  between apparent
distance of the source and the ratio of direct-to-reverberant  signal;
and interactions with this scale of the  particular azimuth,  the
absolute amount and  time of reverberation, and other variables
for   distance   vs. the direct-to-reverberant signal ratio.
Multidimensional scaling techniques, described  in Appendix B, may be
of use in this regard. The spatial model for perceived relationships
seems especially applicable.
.END
.NEXT PAGE